New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML dict: enable color rendering and per-dict user fix html func #3585
Conversation
Also fix possibility of crash in HtmlBoxWidget:free()
I'm not entirely sure that I can just yet. :-P On my H2O I found I wasn't able to scroll, even though it's working on the emulator. |
Confirmed: scrolling with swipe does not work in scrollhtmlwidget (I scroll with tap mostly). It does not scroll for me either in the emulator.
Should I add it to this PR ? or you prefer an individual one? |
I suppose a separate quickie is better for history. |
Another crash that I figure should theoretically be avoided is the error throwing, i.e., sticking it behind an koreader/frontend/ui/widget/htmlboxwidget.lua Lines 52 to 54 in 8c897a0
But that's more of a hypothetical thing. Since I'm not completely sure what the best friendly thing to do is, other than not closing the program[1] I won't throw in a quickie for that one and I won't let it hold up anything tomorrow. ;-) [1] Retry yet again with known good HTML saying something about an error? Close the widget and show some kind of InfoMessage instead? Leave it white with an InfoMessage? Some other option? |
I don't really know. |
Sharing the html fix funcs I made for the 3 HTML french dicts I have (to be put alongside the .ifo and renamed as it with a .lua extension instead of .ifo): |
We should look into a sensible way of including the XML Littré fix. (What's the problem with something that explicitly has XML in the name? Some custom XML DTD rather than HTML?) |
Yes, some XML which is not fully HTML. For example, in my .lua:
(+ some other cosmetics changes for my taste, like replacing colors with smaller italic fonts, etc...) |
Hm, how old is that dictionary? It's being continuously corrected, although it looks like it remains an XML version of ye olde Littré, see here. Is there a script somewhere that automates the conversion from the XML Littré source to stardict? There's https://bitbucket.org/Mytskine/xmlittre-web but I'm not seeing any XSLT in that repo. Anyway, just something that one could consider as a lunch or breakfast project. :-P |
Oh, didn't know that. Files in mine are dated from 2006 :) |
Those curious could investigate whether https://github.com/leafo/web_sanitize could be of any use. Although its purpose is to sanitize (i.e., from scripts and styles), it seems to handle unclosed inner tags. It's a fair bit smaller than libtidy. :-P It depends on LPeg, but we already include that for lua-Spore. |
Note that MuPDF support for xhtml is limited. |
I write a lua script to [test if it is possible to] correctly render persian/arabic strings in definitions, but per my tests koreader even doesn't load my script. May you help me please? |
MuPDF should already do that? If you're talking about the KOReader GUI elements they don't, but I'd probably look into leveraging HarfBuzz. Last year we didn't include it, but now we do. https://github.com/luapower/harfbuzz (Also: https://github.com/ufyTeX/luaharfbuzz)
You didn't say what you're trying to load and use it. |
No it doesn't. Letters are seperated and left to right. |
What exactly is it that it doesn't do in what scenario? See, e.g., #1426 (comment) |
That could be an issue somewhere in how we build or use MuPDF. I'm reasonably sure it's supposed to fall back to Noto. |
I even tried to set the font explicitly through css but it seems that MuPDF doesn't respect the font-family attribute at all. (verified on PC) |
Setting it where and how? If you're testing MuPDF I would stick to the basics (e.g., a simple XHTML file) without complicating matters through the dictionary. ;-) If you want to test the dictionary, see, e.g.: koreader/frontend/ui/widget/dictquicklookup.lua Lines 173 to 184 in 812e595
Which makes it really weird that we're seeing undefined characters; maybe it's actually some issue in the sdcv output or how it's passed on? It definitely listens to the font there, because otherwise you'd be seeing some serif typeface instead of Noto Sans. :-) |
My guess is that output is plain garbage, and not MuPDF failing to display Farsi ;). (i.e., I'd double-check the sanity of the dictionary, and try the sdcv CLI, as @Frenzie suggested). |
tried a css with Now farsi characters are displayed as free space (or nothing!). I'm sure that characters are there, because when I touch and hold on them koreader will seek for their definition and displays them there. |
With a painfully crafted PetitRobert2007_1.1.lua (updated 20180115 ), I managed to turn this:
(the HTML returned is really really bad, full of errors and inconsistencies)
into:
Too bad I won't be using it, as it turns out this PetitRobert2007 does not return all definitions (ie, you get
2. Critique
, the 2nd definition for word critique, that suggest to go see1. Critique
, but you never get it in the results...) edit: just realized the PetitRobert2003 returns the1. Critique
but not any2. Critique
... so it's either a bug when these 2 dicts were made, or something in sdcv...edit: ok, it's a sdcv bug already reported in Dushistov/sdcv#30
(The collect/serialize to fix HTML tags balance I used in this file seems OK, but there's many chances that it won't work as is and need individual tweaks - so I don't think we can and should make use of it as an alternative to htmltidy by default for all HTML output that Mupdf woudln't like).
(@Frenzie : you can release this morning's builds without this with no problem.)